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One-sided confidence regions for continuous cumulative distribu- 
tion functions are constructed using empirical cumulative distribution 
functions and the generalized Kolmogorov-Smirnov distance. The band 
width of such regions becomes narrower in the right or left tail of the 
distribution. To avoid tedious computation of confidence levels and 
critical values, an approximation based on the Poisson process is intro- 
duced. This approximation provides a conservative confidence region; 
moreover, the approximation error decreases monotonically to 0 as sample 
size increases. Critical values necessary for implementation are given. 
Applications are made to the areas of risk analysis, investment model- 
ing, reliability assessment, and analysis of fault-tolerant systems. 
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A POISSON PROCESS APPROXIMATION FOR 
GENERALIZED K-S CONFIDENCE REGIONS 

by 

Hosr^eln Arsham 
Douglas R. Miller 


1. INTRODUCTION AND SUMMARY 

For constructing confidence regions for a continuous cumulative 

distribution function (cdf) F(») , based upon the empirical cdf 

of sample size n , the Kolmogorov-Smlrnov (K-S) distances have been 

widely applied. One problem in applying the K-S distances is that the 

constructed region has a constant band width for a given sample 

size n and significance level (1-a) . 

It is well known that by the definition of the empirical cdf, 

for each x , nF (x) is a binomial (n,F(x)) random variable. There- 
n 

fore the usual binomial confidence interval for F(x) , x fixed, can be 
obtained. This confidence interval is valid at tne single point x 
only and not for all x simultaneously. 

The goal of this paper is to give a compromise between these 
two extremes. For the point x in either tail one can construct a 
one-sided confidence region, either upper or lower, with a narrower 
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band In the neighborhood of x ; that is, one has more confidence in the 
one- tall probabilities at the expense of less confidence in the central 
and other tall probabilities. Numerou statistics may be used to construct 
such a confidence region, and these are discussed in Section 2.2. 

In this paper the confidence region has one of the following 


desirable forms: 

F(x) < (6/n) + yF (x) , V X , Y > 1. i > 0 (1.1) 

n 

F(x) < (6/n) + yF^(x) ,Vx,0<Y<1, <S>0 (1.2) 

F(x) ^ -(6/n) + yF^(x), Vx ,y^1» 5>0 (1.3) 

F(x) > -(6/n) + yFj^(x), Vx,0<y<1»<5>0 (1.4) 


The distribution of the generalized K-S statistics may be used to obtain 
the significance level (1-Ot) of these desired confidence regions. Al- 
though the closed form of a in terms of (Y,6,n) is available we have 
shown that, due to computational difficulties and, moreover, the need for 
extensive tables with three entries, a meaningful upper bound on the value 
of a can easily and quickly be computed based on the Poisson process. 
With this Poisson process approximation, a conservative confidence re- 
gion of the desired shape is obtained. Moreover, it is shown that the 
error committed by this approximation becomes monotonically smaller as 
the sample size grows larger. In the following sections we provide the 
relevant background leading to the use of the generalized K-S distances, 
describe the difficulties involved in implementing such distances, and 
prescribe the Poisson process approximation to overcome these difficul- 
ties. Some areas of approximation are identified, and tables and graphs 
provided, along with examples of how they are used. 
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2. CONFIDENCE REGION WITH NONCONSTANT WIDTH 

Let F(«) be a cumulative distribution function, contin- 
uous on R*** . The ordered sample from this distribution function will 
be denoted by X, , and the related empirical cdf by F (*) . 

Let D denote a general "distance" between the two distribution func- 
tions F(*) , F (*) (we use "distance" in a nonmathematical sense, 
n 

essentially different from the mathematical conception of "norm"). Then 
D(F,F^) is said to be distribution free in the family of continuous F 
if and only if 

P[D(F,F ) < d] «= P[D(F (f"^),U) < d] , d £ (2.1) 

n n 

where U(*) denotes the cdf of the uniform [0,1] random variable. In 
the following subsections we explain how some distribution-free dis- 
tances are used to construct a confidence region over F(') based on 
F^(*) . Most of the distances we used in our study are those which, 
under a simple null hypothesis on the form of F(») , F continuous, 
become the usual statistics widely discussed and used in the goodness- 
of-fit literature. 


2.1 The Generalized K-S Distances 

The generalized K-S distances are defined to be [see, for example, 
Dempster (1959), Dwass (1959), or Pyke (1959)] 

Dn(y) = sup (yF(x) - F^(x)] (2.2) 

X 

□■^(Y) = sup [F (x) - yF(x)] (2.3) 

X " 

Arsham (1982) tabulated the right tail distribution of these distances 
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for some values of n and Y • Using U^(Y) one can construct a 
confidence region of the form F(x) < 6/n + having a narrower 

band over either the left (y ^ 1) or the right (0 < Y ^ 1) tail. Simi- 
larly, a confidence region of the form F(x) > YF^(x) “ can be ob- 
tained by utilizing the distribution of • The confidence region 

becomes narrower over either the left (0 < Y < 1) or the right (y > 1) 
tall. In the following we Illustrate how such confidence regions can 
be constructed. 

Suppose one is interested in constructing a lower confidence 
region that narrows over the right tail, F(x) ^ yF^^Cx) - 6/n , V x , 

Y > i , 6 > 0 . The significance level can be obtained by noting that 

p[d^ = p[f(x) > YF^(x) - 6/n, V^ = l-a, Y>1, 5>0 

By the standard distribution-free argument, this probability can be 


written as 


p[d~ ^ 6/n^ = pju^(x) ^ Y X + 6/nY, 0 < x < ^ = P^(Y,6) 

where U (•) is the empirical cdf of the uniform [0,1] random variate 
n 

and a = 1 - P (y, 6) can be interpreted as a crossing probability, 
n 


Specifically, 


1 - P^(y, 6) = p|u^(*) crosses Y(x) = ^ x + 6/n"^ (2.4) 

The closed formula of P (y, 6) in terms of (n,6,Y) is given by Dwass 

n 


(1959) and by Durbin (1973): 


n -ny+6 


Pn(Y.6) = 1 - I 

" j=[l+(5/nY)] 


(nYj - 6)^(n^ - nYj + 6)"“j‘^ 


(2.5) 


2 2 

for n (y - 1) ^ 6 < n and Y > 1 where the notation [z] stands for 
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the largest integer < 2 . In later sections we return to the generalized 

K-S confidence region and establish an approximation to It based on the 

Poisson process. In the last section of this paper we have graphically 

displayed some confidence regions of the forms F(x) ^ - 5/n and 

F(x) < yF (x) + 6/n , V x, y > 1, 6 > 0 . 
n 


2.2 Other Nonconstant Width Confidence Regions 

X-S Distance with a Partioulan Weight Function, This distance is 

defined by Anderson-Dfl.rling (1952) as 

K = sup I F (x) - F(x) I • W[F(x)J (2.6) 

x:(KF(x)<1 " 

where W(*] is a nonnegative weight function. When a suitable weight 

function is chosen, many distribution-free distances are reduced to 

K . For example, W(y) = 1 leads to the two-sided K-S distance. 
n,w 

- 1/2 

With the weight function W[y] = [y(l - y) ] this distance can pro- 

vide a two-sided confidence region discussed in Doksum (1977). Consider 
the following normalized version of ^ : 

I F (x)-F(x) 1 

D = K = sup (2.7) 

x:0<F(x)<l /F(x)(l-F(x)) 

The two-sided confidence region using D can be obtained by noting 


P[D < d(a,n,w)] = 1 - a . 
n,w 

Following Doksum (1977), this can be written as: 

P{(1 + a)F^(x) - [2F (x) + a] • F(x) + F^(x) < 0, V x) = 1 - a 

n n 

2 

where a = (d (n,a,w))/n , or equivalently 


- 5 - 


- 


2(l+a) 


> F(x) > 


2(l+a) 


V X 




1 - a 


( 2 . 8 ) 

where A[F (x)1 » -4aF^(x) + 4aF (x) + a^ . 
n n n 

Noe (1972) obtained a truncated power series which approximates 
a for a given n and d . The Inversion of d In terms of o for 
the statistic In (2,7) Is 

d"^ = 2"^a - 2"^ (3 - 5n"^)a^ - 2"^ (14 - 132n"^ + ll4n“^)a"^ 

- 2"^ (151 - 4035n”^ + 1298ln"^ - 9105n”^)a”^ . (2.9) 

Neither the general term nor a general truncation error bound Is known. 
In practice, to construct the confidence region by using D one 

Tl f W 

chooses a level of significance 1 - a , then by means of the truncated 

2 

series (2.9) determines the corresponding value of a * d /n . Thus one 
obtains the two jagged shaped bounds c^ and C 2 whose equations are 
given in the probabilistic equation (2.8). Figure 1 shows a realization 
of a sample of size 20 from the uniform distribution [0,1] with its 
bounds for 95% confidence. In Figure 2 the sample size is set to be 
n = 1000. A comparison of these two figures shows that for a larger 
sample size, both bounds "come in" at both tails. The limitation of 
using this distance as a solution to our problem is that one can obtain 
only an approximated confidence region. Moreover, Canner (1975) has 
noted that this distance is very sensitive to first anJ last order 
statistics; this implies that the confidence interval is very narrow in 
the tails at the expense of the center of the distribution. 

A Modified K-S Confidence Region Based on Censoring . When a K-S 
confidence region is constructed using the truncated or censored data. 
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Figure 1. — A realization of sample size n “ 20 from U[0,1] 
together with the 95% confidence region based on 
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the actual region would be a fixed width band over the right tall» since 

In this case It Is not required to remain within the band beyond the fth 

failure In an "f out of n censored" plan. Consider the following dis*> 

tance given In Barr and Davidson (1973) : 

T- - sup }F(x)-F„(x)1 (2.11) 

x:CXF(x)<(f/n) " 

The two-sided confidence region using T, can be obtained by noting 

t ,n 

that 

P[T, „ < d(f,n,a)3 « 1 - a 
t ,n 

where the critical values d(f,n,a) are tabulated for some value of n 
[Barr and Davidson (1973)]. Later Koziol and Byar (1975) provided the 
asymptotic critical values as n approaches infinity. A good approxi- 
mation formula for significance points is given by Dufour and Maag 
(1978) when sample size exceeds 25. Figure 3 shows a typical confidence 
region using a sample from a uniform distribution based upon the dis- 
tance defined by (2.11). 

Manija Confidence Region. Manija (1949) introduced the following 
distance: 

d (a,b) ® sup [F (x) - F(x)] , a < b 

" xeS " 

where 

S = {x I F(x) < a} U {x I F(x) ^ b} . 

By the general distribution-free argument, the distribution of d^(a,b) 
is independent of F(*) » F continuous over the set S . A lower con- 
fidence region using this distance can be obtained by noting that 

P[U(x) ^ U (x) - z(a,b,n) for all x e S] = 1 - a 
n 

based upon a uniform empirical cdf path. A typical lower confider'ce 
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region based on this distance using a sample from l'[0»l] is shown in 
Figure 4. The limitation of applying this distance in construcflon of a 
confidence region is that it provides an approximation region only. The 
asymptotic distribution of d^(a,b) is available* and is given in 
Sahler (1968) . 

Tang Confidence Region. Tang (1962) developed a distance based 
upon the ratio between the empirical and hypothetical cdf's. Thin dis- 
tance is a special case of the Renyl (1953) distance. The Tang distance 
is defined for 0 b ^ n as 


Fn(x) 

r (b) - sup 

" x:0<F(x)<(b/n) ' 

By the usual distribution-free argument, distribution inde- 

pendent of F(0 * if F(*) is continuous over the set 

S - {x 1 0 < F(x) < b/n < 1} 

A one-sided confidence region using ^^^^(6) can be obtained by noting 
that 


P 


F(x) > 


d(a,n,h) 


for all X e S 




1 - a 


Figure 5 shows a typical confidence region using a uniform empirical cdf 

based on the r (b) distance. The distribution of r^(b) in closed 
n n 

form is available but it is not easy to implement. 


3. THEORY OF POISSON APPROXIMATION TO 
GENERALIZED K-S PROBABILITIES 

Fo generalized K-S confidence regions it is necessary to calcu- 
late the crossing probabilities 1 - F^('t, 6) from equation (2.5). For 
a given confidence level (1 - a) it Is necessary to find solutions of 
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Figure 4. — A typical lower confidence region based on Manija 
distance using sample size n from UlO,l] . 
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the equation a ■ • To avoid these computational difficulties 

and» moreover, to avoid generating very extensive tables with the three 
parameters n , y > find 6 , ve have developed a conservative bound on 
P^(Y,5) which Is quite accurate and easy to compute based on the 
Poisson process. The theory for this approximation Is presented In the 
four theorems of this section. 

Let {X(t), 0 < t} be a homogeneous Poisson process with unit 
rate. Let {U^(t), 0 ^ t ^ 1} be the empirical cdf of a sample of n 
U[0,1] random variables. 

Theorem 1 For 0 < Y < 1 and 6 > 0 , p(nU^(t/n) ^ t/y + 6/Y» 

0 < t < n3 decreases monotonically as n increases. Furthermore, 
lim P^nU (t/n) ^ t/y + 6/y, 0 ^ t ^ n^ 

n-w> 

= PCx(t) < t/y + 6/y, 0 < 0 (3.1) 

= (1 - y) I exp(6 - yj) 

j=[5/y]+l 

Proof Dwass (1974^ shows that, for c > 1 and d > 0 , 

'n 

n 

PfnU (t/n) <ct+d, 0<t<n) = P I U <d (3.2) 

n . , 1 

where u. » i = l,2,...,n , are 1.1. d. U[0,1] random variables and N 
1 n 

is an independent random variable with 

i^J k = 0.1.2. ••• (3.3) 

and 

M 

P(x(t) < ct + d, 0 < 0 = P I U. < d (3.4) 

i-l " J 
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where M Is an independent random variable with 

P(M > k) - - . k - 0,1,2,... . 

IcJ 

From (3.3) it follows that 

St 

n ^ n+1 


It follows from (3.3) and (3.5) that 


lim P(N > k) » P(M > k) , 
n-x» ** 


k •“ 0,1,... . 


(3.5) 


(3.6) 


(3.7) 


Define f ,(m) = P 
d 

function such that 


m 


Z u. > d 

i»l 


; fj is a monotonically nondecreasing 


P(M = m) 



' M 

00 

r \ 

M 

p 

I u > d 

= 1 p 

1 U > d I M = m 


ii-l " ; 

m— 0 

[i-i ‘ J 


= I = m) 


(3.8) 


m=0 


= ECfj(M)} . 


It follows from (3.6) and the increasing nature of f, that 

d 

f ,(N ) ^ fj(N ,, ) , which implies that E(f,(N ) < E(f , (N ,,)) which, 
d n d n+i d n d n+i 

together with (3.2) and (3.8), proves the monotonicity in the statement 

of the theorem. It follows from the discreteness of the random variables 

and (3.7) that lim f,(N ) = f ,(M) in distribution, from which 
n-KD d n d 

lim E(f ,(N )) = E(f ,(M)) follows by dominated convergence; this, 
n-*oo d n d 

together with (3.2), (3.4), and (3.8), proves the limiting result in 
(3.1). Finally, the second equality in (3.1) is given by Pyke (1959).// 


Theorem 1 provides the necessary theory regarding crossing of 
upper lines, y(t) = (l/y)t + (6/ny) , and Theorems 2, 3, and 4 deal 
with the lower lines, y(t) = (l/y)t - (6/ny) . 
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Theorem 2 For Y ^ 1 and 6 > 0 , pQiU^(t/n) > t/Y “ <5/Y» 0 < t < n) 
is monotone nonincreasing in n . 


Proof Define c = 1 /y > d ■ 5 /y . and V (t) - nll_(t/n) , 0 ^ t < 

n n 

n . Let P^(c,d) » PCv^(t) > ct - d, 0 < t < n} ; then, for 0 < c < 1 

and d > 0 , we must verify 

P (c,d) ^ P (c,d) , m < n . (3.9) 

m n 

The proof is based on induction. We first verify (3.9) for m = 

1 : Letting S . = min^t; V (t) = j} , P. (c,d) = P(S ^ < d/c) = d/c 

n,j '■ n ■'1 1,1 

if d/c < 1 , and 1 otherwise. For n > 1 , if d/c < 1 , P^(c,d) < 

P(S , < d/c) = 1 - P(S , > d/c) = 1 - (1 - d/c)" < d/c . This veri- 
n,l n,l 

fies (3.9) for m = 1 . 

We now make the inductive hypothesis that (3.9) holds for m < k . 
To complete the proof it suffices to show that this implies that (3.9) 
is true for m = k . This inductive step of the proof uses, for fixed 

k and n , dependent versions of {V, (t), 0 < t < k} and {V (t), 

iC T1 

0 < t < n} defined on the same probability space. 

The process {V (t), 0 < t < m} is a pure birth process with ini- 
m 

tial distribution P(^V^(0) = 0^ = 1 and transition rate function 
Xjj^(i,j;t) = (m-i)/(m-t) if j = i + 1 , and 0 otherwise, i j , 

0 < t < m . Let Dj^ = rain^t > 0; Vj^(t) 0 * shall define a modi- 
fied version of which jumps to <» when it crosses the diagonal: 


(V'> • 

v.U)=j^ . 

This process has transition function 
j=i+l,i+l<t^k or if j= 


0 < t < D, 
k 

D, < t < k 
k 

= (k-i)/(k-t) 

00 , t < i + 1 , and = 0 


if 

otherwise. 
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Note that X,’ (t) > X' (t) , 0 < t < k , if k < n . This implies that 
K n 

(V'(t) , 0 < t < k} V ^V’ (t) , 0 < t < k} and furthermore that V' 

K n K. 

and can be defined on the same probability space so that 

PCv^(t) ^ 0 ^ t < k) - 1 [see Kamae, Krengel, and O'Brien (1977) 

and references therein] . From this we get the joint distribution of Dj^ 

and • Using these distributions we define {V|J(t), < t < k} 

and {V||(t), < t < n} : Given {Dj^ = s} , V|^ is a birth process 

with initial state V|^(s) “ j • where j*min(i: i > s) and transition 

rate function X||(i,j;t) = , s < t < k . Given {Dj^ * s, 

V (D, ) = i} , V" is a birth process with initial state V"(s) * i and 
n k n n 

transition rate function X|J(l,j;t) ■ X^(i>j;t) , s < t ^ n . Further- 
more, {Cvj^(t), V^(t)}, 0 < t < Dj^} , (v;;(t), Dj^ < t < k} . and {VjJCt) , 

< t < n} are conditionally independent given 

. Let Vj^(t) = V^(t) , 0 < t < , and V|^(t) , ^ t < k . 

Let V (t) = V (t) , 0 ^ t < D. , and V"(t) , D. < t < n . These de- 

H H R H K 

pendent processes, and , will be used in the induction step of 

the proof. 

(For the sake of completeness, we give an explicit construction 

which can be shown to yield the above (Vj^, V^) ; Let ^ , 0 < i ^ k-1 , 

and X .,0<i<n'l be k+n i.i.d. uniform [0,1] random variables 
n,i 

defined on the same probability space. Let G^ ^(yjs) = 1 - 

expf - Xjj^(i,i+l;t)dt| be the cdf of the holding time of in 


state i given the passage to 

i occurs at s, 

m = k , 0 < i ^ k-1 

, and 

m = n , 0 < i < n-1 . We first construct 

{Vj^(t), 0 < t < k} . 

Let 

^k,0 “ ° ’ \,0 “ ^k,0^\,0 

1 ^k,0^ ’ \,1 

“ \,0’ \,i “ 


S,i^\,i I ^k,i^ ’ ^k,i+l ’ 

^k.i ^k,i ’ 

•** ’ ^k,k " ^k,k-l 

\.k- 
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Let " max(i: ,0<t<k. This defines Vj^ and also 

■ min^t > 0: " t3 • Now let us consider 0 < t < n} . 

On the interval [0,Dj^] , and must be ordered with probability 

. = G“\(X, . I S .) , if S . . 1 - S . + Y . < D. , 
n,i n,i Tt,i ' n,i n,i+l n,l n,l k 

0 < i < k . Let j = minCi: ^ i I ®n i^ ^ ^k^ ’ define 

Y . = g"^ . (X .Id.) and S = D, + Y . . Then for j < H < n-1 

n,j n,j’‘ n,j ' k'' n,j+l k n,j 

let Y_ 0 = gI^o(X_ 0 I S„ o) and . As before, let 


one. Let Y 


n,Jl n,f,^ n,f. ' n,A 


V (t) = max(i: S . < t) . It can be shown that the processes V, and 

n n, 1 K. 

V have the properties claimed in the preceding paragraph by appealing 
n 


to standard construction techniques such as are found in Heyman and 


Sobel (1982, Ch. 4) ard comparison techniques such as those found in 

Kamae, et dl. (1977) and Stoyan (1977).) 

Now consider the dependent processes 0 < t < k} and 

{V (t), 0 < t < n} constructed above. Letting G(s,i) = P{D, < s, 
n K 

Vn(Dj^) < i} , 

P (c,d) = / PCVj^(t) > ct - d, 0 < t < n I Dj^ = s, V^(Dj^) = i3dG(s,i) 

= / PCv„(t) > ct - d, 0 < t < s I Dj^ = s, V^(D^) = i) 

. PfV (t) > ct - d, s < t < n I D. = s, V (D ) = OdG(s,i) 

^ n K XI ic 

(3.10) 


and 

Pj^(c,d) = / PCVj^(t) > ct - d, 0 < t < s I \ = s. V^(Dj^) = ij 

. PCVj^(t) > ct - d, s < t < k I Dj^ = s, V^(Dj^) = i} dG(s,i) . 

(3.11) 

The inequality P(|Vj^(t) > ct - d, 0 < t < s | Dj^ = s, ^ 

Pfv (t) > ct - d, 0 < t < s I D, = s, V (D, ) = 0 follows from the 
n ' ic n K 
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ordering between and . Thus to prove (3.9) for m “ k from 

(3.10) and (3.11), it suffices to verify 

PC\(t) > ct - d, s < t < k I » s, V^(Dj^) - i) 

(3.12) 

> PCv^(t) > ct - d, s < t < n I - s. V^(Dj^) - i) • 

However, the left-hand side of (3.12) is Independent of 1 and the 
right-hand side achieves its largest value for i = j-1 , where j “ 
mln(£,: Z ^ s) , thus it suffices to verify 

PC\(t) >ct-d, s<t<k I Dj^=s, Vj^(s) = j} 

(3.13) 

> PCVj^(t) > ct - d, s < t < n I = s, V^(s) = j - 0 


The probability expressions in (3.13) are equivalent to those involving 
processes with a fewer number of transitions: The right-hand side may 

be evaluated by labeling (s,j-l) as the origin and recognizing that in 
the remaining interval of length n - s , the process is equivalent to 
counting n - j + 1 order statistics. With the appropriate scaling 
this gives 


> ct - d, s < t < n 1 = s, V^(s) = j-l^ 

V^_j^l(t) > yj^(t) = t + cs-d-j+1, 0 < t < n-j+lj 


(3.14) 


= P 


Similarly, for the left-hand side of (3.13), 

>ct-d, s<t<k I Dj^ = s, Vj^(s) = 


= P[vj^_j(t) > V 2 (t) = t + cs-d-j, 0 < t < k-j 

> PC\_j(t) > yj^(t), 0 < t < k-j] 


(3.15) 


the last inequality following from y^^(t) ^ y 2 (t) , 0 ^ t ^ k-j . If 
yj^(O) > 0 , then (3.14) equals zero and (3.13) follows trivially. If 
yj^(O) > 0 , we use the facts that the slope of y^(») is less than 1, 
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m ■ k-j < k and > k-j , to Invoke the Inductive hypothesis » 

proving (3.13). This completes the proof of Theorem 2. // 


Theorem 3 For Y ^ 1 and 6 > 0 , 

PCnU^(t/n) > t/Y - 6 /y, 0 < t < n) > PCx(t) > t/Y - 6/Y» 0 < 0 

(3.16) 

Proof The proof of Theorem 3 parallels that of Theorem 2; Define 

c * 1/Y , d = <5/Y . and Uj^(t) “ nU^(t/n) » 0 < t < n , Let P^(c,d) = 

Pfy (t) >ct-d, O^t^n) , and let P (c,d) = PCx(t) > ct - d, 0 < t^ • 
Note that P^ (c,d) ^ P (c,d) follows from the fact that a uniform [0,1] 
random variable is stochastically less than an exponential random vari- 
able with mean 1. Next, make the inductive hypothesis that P^(c,d) ^ 
P^(c,d) , n < k . Consider the process Vj^ , letting = min^t > 0: 

Vj^(t) = t3 , and define the process 0 < t < Dj^ , 

= oo , for < t ^ k . Note that the transition function of is 

greater than the transition function of the Poisson process X ; thus, 
it is possible to construct dependent versions of and X such that 

P^V^(t) > X(t), 0 < t < k^ = 1 and from this get a construction of Vj^ 
and X such that P0^j^(t) > X(t), 0 < t < = 1 . Using analogs of 

(3.10) and (3.11) ve obtain 

P(]Vj^(t) > ct - d, 0 < t < s 1 \ = s, X(Dj^) = i;) 

^ P(;x(t) > ct - d, 0 < t ^ s 1 Dj^ = s, X(Dj^) = 1} 


from the construction on [0,Dj^) and note that it suffices to demonstrate 


PCv^(t) > ct - d. s < t < k I Dj^ = s, X(Dj^) = 0 
> PCx(t) >ct-d,s<t 1 s, X(Dj^) = 0 


(3.17) 
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to collate the proof. The right-hand side of (3.17) ia less than or 
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equal to 

PCx(t) > ct - d, s < t 1 X(s) - J - l) 

■ pCx(t) > y^Ct) ■ ct + cs - d - j + 1, 0 < 0 


(3.18) 


where j ■ min(i: il ^ e) . The left-hand side of (3.17) equals 
PCVj^(t) > ct - d, s < t < k I « s, Vj^(s) - j3 


- PCv^,j(t) > y 2 (t) - t + cs - d - j, 0 < t < k-j} (3.19) 

> > ^3^*"^* 0 < t < k-j) . 

Equations (3.18) and (3.19) combined with the inductive hypothesis veri- 
fy (3.17) completing the proof of Theorem 3. // 


Theorem 4 For Y ^ 1 » <S > 0 , 

lim pfnU (t/n) > t/y - <5/y, 0 < t < n) 
rr*°> ” 

- PCx(t) > t/y - 6/y, 0 < 0 (3.20) 

- exp(-6z/y) , 

where z is the nonnegative root of the equation 

y(l - e“^) = z (3.21) 


Proof Given e > 0 , let k^ be an integer such that 

P(]x(t) > t/y - 6/y, 0 < t < yk^ + 6) 

< pfxCt) > t/y - 6/y, 0 < t) + e 


(3.22) 


This follows from Pyke (1959, Theorem 2, equation 9). Define ^ “ 
min^t; nU^(t/n) ■ i) , i = l,2,...,n , and X^ ■ min(^t : X(t) “ i) , 

1 ” l,2,...,n . It follows from Miller (1976) that the joint distri- 
butions of {U. , i » 1, 2, . . . ,min(n,k )} converges to that of 

a. ^ n C 
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{X^, 1 ■ 1,2 k^} . This Inplles 

lim p(nU (t/n) > t/y - 6/Y, 0 < t < n) 


n-H» 


< 11m p(nU (t/n) > t/y - 6/y, 0 < t < mln(n,yk + 6)) (3.23) 

n-x» " ^ 

- PCx(t/n) > t/y - 6/y, 0 < t < yk^ + 6) . 

Equations (3.22) and (3.23) Imply that 

lim P^nll (t/n) > t/y - 6/y, 0 < t < n} 

n-H» ^ 


(3.24) 


< PCx(t/n) > t/y - 6/y, 0 < 0 + e . 

Theorem 3 and (3.24) verify the limit In (3.20). The second equality 
on (3.20) Is given by Pyke (1959). // 


4. IMPLEMENTATION AND SOME NUMERICAL RESULTS 

In the following we provide some aspect of our findings related 

to Theorems 2, 3, and 4 in more detail. The goal is to construct 

confidence regions of the form F(*) <yF (•) + 6/n , y > 1 . Some 

n 

numerical results are provided, together with some examples of how these 
results are used. In the construction of an upper confidence region, 
one is interested in at least one of the following problems. 

(i) Given the desired shape of an upper confidence region of the form 
F(*) < ^ given (6,y,n) , what is the sig- 

nificance level a of such a confidence region? 

(ii) Given (n,y,a) , find a 6 such that the corresponding upper 
confidence region has 100(l-a)% confidence. 

(iii) What value of y can ensure that the upper confidence region of 
the form above has 100(l-a)% confidence? 
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(Iv) What la the smalleat sample size n necessary to ensure that 
6 ’}■ YF^(x) » given (6 ,y) » is an upper confidence region with 
100(l-a)Z confidence, where 6 " 6/n ? 

The "exact" solutions to all these problems can be found using 
formula (2.5). Due to the computational difficulties In Implementing 
such a formula and, moreover, the need to construct several extensive 
tables for each problem, we use Theorem 3, which shows that a conserva- 
tive approximate solution for all these problems Is possible. Moreover, 
by Theorems 2 and 4 the error committed by this approximation goes to 
zero monotonlcally as the sample size grows larger. Table 1 shows the 
numerical results for some values of 5 , Y » and n as computed by 
formula (2.5). The last column of this table provides the Poisson ap- 
proximation computed by formula (3.20) of Theorem 4. We notice that 
these "exact" values of a converge to their Poisson approximations as 
n , the sample size, increases as expected. The curves of Figure 6 are 
derived from additional computation by formulas from Theorem 4. Specif- 
ically, for a given 6 and n find a y such that the upper confi- 
dence region has at least 100(l-a)% confidence. By Theorem 4 one can 
approximate a as 

a “ exp 

where z Is the nonnegative root of the equation (3.21). Thus after 
some manipulation, one obtains 

Y - log [C6 + log(a))/6] . (4.1) 

The above results are used in the next section, where we provide some 
real world applications. Similar results can be developed by 
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Figure 6. Poisson approximation to the significance level a, for the 
confidence region of form F(*) ^ yF (•) + 5/n . 
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Implementing Theorem 1, but since we have no compelling application for 
these cases, we have not pursued their Implementation. 


5. SOME APPLICATIONS 

Since the Idea of constructing a confidence region over the cdf, 
based on the generalized K-S distance, Is new, we present some areas of 
application where this Idea Is useful. 

5.1 Application In Risk Analysis 

Gross, Miller, and Soland (1980) studied and gave details of 
confidence region construction of a risk profile defined as R(t) = 

1 - F(t) . In the following we take one of their examples and apply 
our findings. Their data base Is a typical simulated risk profile based 
on a sample of 500 observations. It Is desired to construct a confi- 
dence region R(t) < 2R^(t) + d with 1-a = 95% confidence. They 
utilized the formula given in (2.5) and obtained the "exact" value d =* 
.0075 . Although their approach is straightforward, it was necessary 
to write a large and tedious program to find the d value. The desired 
region is equivalent to the upper confidence region 2F^(t) + d . Using 
the result of Theorem 4 one obtains the following relation from formula 
(4.1): 



where log (a) is the natural logarithm of a . This relatively easy 
equation can be solved by numerical methods. We employed the method of 
binary search and obtained 6 == 3.760 and therefore the conservative 
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value for d ■ — ■ .00752 . In fact, from Figure 6 one can easily 
obtain an accurate enough solution. Figure 7 shows a typical simulated 
risk profile with Its confidence region. 

5.2 Application In Investment Modeling 

The Investment Department of the World Bank developed a trading 
strategy for U.S. Treasury Notes. The strategy alms to maximize the 
rate of return from its Investment In Treasury Notes. The basic Idea 
behind the strategy is based on trend-following. Turning points In the 
movement of prices or yields can be identified as generating "buy" and 
"sell" signals. "Buy" signals imply that Treasury Notes be bought for 
all cash proceeds, and "sell" signals imply that all Treasury Notes held 
be sold and all cash proceeds Invested immediately in Federal funds 
until the next "buy" signal. Federal funds represent money that banks 
hold and which can be lent to other banks to fulfill their reserve re- 
quirements. The Interest rate that banks pay when they borrow Federal 
funds is called the Federal funds rate; these loans usually are made on 
an overnight basis. 

VHien the "buy" and "sell" signals are generated from the trend- 
following strategy, the rate of return is calculated on a quarterly 
basis. They are then compared with some "neutral" strategy — for example, 
the rate of return in pure Federal funds investment strategy; that is, 
investing all money in Federal funds daily, on an overnight basis. 

The differential rates of return, or the difference between 
the rates of return from the trend-following strategy and the rates of 
return from the Federal funds strategy, are calculated on a quarterly 
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Figure 7. Simulated risk profile and confidence regions: risk profile 

( — ’ — *— ) based on 500 simulated observations; K-S 95% 
upper bound + .056 ( — 0 - 0 --) , generalized K-S 95% upper 
bound 2R^ + .0075 ( — X— X— ) approximated by the Poisson 
process. 


- 32 - 



OF POOR QUM-rTf 

basis. Some measures of performance are calculated. For example* if 

the differential rate of return is positive, the trading strategy is 

superior in that quarter; if it is negative, the Federal funds strategy 

is superior. The measures of performance chart how much better or worse 

the trading strategy performs than the Federal funds Investment strategy 

in the long run. The World Bank defines "reward" as the expected value 

of positive differential return, and "risk" as the expected value of 

negative differential return. Let 

r, = differential rate of return; 
d 

then 

Reward “ E[Max(0,r^)] 

Risk = E(Min(0,r^)] . 

Using the daily historical prices and yields from June 1974 through 
December 1981, Table II can be obtained, where the quarterly rates of 
return during this period are presented. 

The World Bank is interested in constructing an upper confidence 
region for the cdf of the differential rate of return of the following 
form: 

F(r^) < + 6, y>l, 6>0 

with a 95% confidence. This can easily be done as follows. Let us, for 
a given 0 = .17 , construct an upper confidence region of the form (1.1) 
with a < .05 . With 6 = 30(.17) = 5.14 and using (4.1) we obtain, 

Y = YogYa) ^ log(a)}/5] 

Y = 1.5 . 

The same result can be obtained directly from Figure 6. Thus, 
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1.5 F^(r.) + .17 . -• < r . < +» 

n a d 

Is an upper bound for the cdf of the differential rate of return with at 
least 95Z confidence, with the characteristic of having more confidence 
in risk taking circumstances. Figures 8 and 9 show the empirical cdf 
obtained from Table II together with the upper confidence region tcllow- 
ing two-year and five-year Treasury notes, respectively. 

5.3 Reliability Estimation 

Suppose an item has a lifetime distribution F(t) ■■ P(L t) , 
t ^ 0 . In some contexts, such as the analysis of a pro-rated warranty. 
It is desirable to have more accurate estimates in the left tall of the 
distribution. This leads to a confidence interval of the fora: 

F(t) < YF^(t) + 6 , t ^ 0 

with Y > 1 • 

5.4 Recovery Times In Fault-tolerant Systems 

Critical systems must often meet very high reliability require- 
ments. This high reliability is achieved by incorporating fault toler- 
ance into the system. [A typical application is flight-critical 
avionics computers for aircraft, Hopkins, et al. (1978), and Wensley, 
et al. (1978).] When a fault occurs in such a system the system must 
detect it and take appropriate remedial action, reconfiguring itself so 
that the offending component no longer has potential for contributing to 
system failure. The length of time needed to achieve detection and re- 
configuration has a very strong influence on system reliability; thus 
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Figure 8. — An upper confidence region for cdf of differential rate of return following five-year notes trend. 
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Figure 9. — An upper confidence region for cdf of differencial rate 
of return following two-year notes trend. 
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it is important to accurately estimate the recovery (or coverage) t^ 
distribution C(t) ■ P(Tj^ t) , t > 0 , frosi data which may be obtained 
from bench tests, slimilations, or actual operation. Since long recovery 
times pc;^e a much greater threat than storter ones, a confidence inter- 
val should take tl^ fcrm 

C(t) > YC (t) - 6 , t > 0 , 

n 

where C^(*) Is the aaplrlcal cdf and y ^ 1 • 
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